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Abstract 

Cox's well-known theorem justifying the use of probability is shown not to hold in 
finite domains. The counterexample also suggests that Cox's assumptions are insufficient 
to prove the result even in infinite domains. The same counterexample is used to disprove 
a result of Fine on comparative conditional probability. 

1. Introduction 

One of the best-known and seemingly most compelling justifications of the use of probability 
is given by Cox (1946). Suppose we have a function Bel that associates a real number with 
each pair (U, V) of subsets of a domain W such that U ^ 0. We write Bel(V|?7) rather than 
Bel(£7, V), since we think of Bel(V|?7) as the credibility or likelihood of V given U. 1 Cox 
further assumes that Bel(V|?7) is a function of Bel(V|?7) (where V denotes the complement 
of V in W), that is, there is a function S such that 

Al. Bel(V\U) = S(Bel(V\U)) if U + 0, 

and that Bel(V f)V'\U) is a function of Bel(F'|F n U) and Re\(V\U), that is, there is a 
function F such that 

A2. Bel(V n V'\U) = F{Be\{V'\V n U), Bel(V\U)) if V n U ^ 0. 

Notice that if Bel is a probability function, then we can take S(x) = 1 — x and F(x, y) = 
xy. Cox makes much weaker assumptions: he assumes that F is twice differentiable, with a 
continuous second derivative, and that S is twice differentiable. Under these assumptions, 
he shows that Bel is isomorphic to a probability distribution in the sense that there is a 
continuous one-to-one onto function g : M — > M such that go Bel is a probability distribution 
on W ', and 

g(Bel(V\U)) x g(Bel(U)) = g(Bel(V n U)) if U + 0, (1) 

where Bel(?7) is an abbreviation for Bel(?7|W). 

Not surprisingly, Cox's result has attracted a great deal of interest, particularly in the 
maximum entropy community and, more recently, in the AI community. For example 

1. Cox writes V\U rather than Bel(y|[/), and takes U and V to be propositions in some language rather 
than events, i.e., subsets of a given set. This difference is minor — there are well-known mappings from 
propositions to events, and vice versa. I use events here since they are more standard in the probability 
literature. 
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• Cheeseman (1988) has called it the "strongest argument for use of standard (Bayesian) 
probability theory". Similar sentiments are expressed by Jaynes (1978, p. 24); indeed, 
Cox's Theorem is one of the cornerstones of Jaynes' recent book (1996). 

• Horvitz, Heckerman, and Langlotz (1986) used it as a basis for comparison of proba- 
bility and other nonprobabilistic approaches to reasoning about uncertainty. 

• Heckerman (1988) used it as a basis for providing an axiomatization for belief update. 

The main contribution of this paper is to show (by means of an explicit counterexample) 
that Cox's result does not hold in finite domains, even under strong assumptions on S and 
F (stronger than those made by Cox and those made in all papers proving variants of Cox's 
results). Since finite domains are arguably those of most interest in AI applications, this 
suggests that arguments for using probability based on Cox's result — and other justifications 
similar in spirit — must be taken with a grain of salt, and their proofs carefully reviewed. 
Moreover, the counterexample suggests that Cox's assumptions are insufficient to prove the 
result even in infinite domains. 

It is known that some assumptions regarding F and S must be made to prove Cox's 
result. Dubois and Prade (1990) give an example of a function Bel, defined on a finite 
domain, that is not isomorphic to a probability distribution. For this choice of Bel, we can 
take F(x, y) = min(x, y) and S(x) = 1 — x. Since min is not twice differentiable, Cox's 
assumptions block the Dubois-Prade example. 

Other authors have made different assumptions. Aczel (1966, Section 7 (Theorem 1)) 
does not make any assumptions about F, but he does make two other assumptions, each 
of which block the Dubois-Prade example. The first is that the Bel(V|£7) takes on every 
value in some range [e, with e < E. In the Dubois-Prade example, the domain is finite, 
so this certainly cannot hold. The second is that if V and V are disjoint, then there is a 
continuous function G : M 2 — > M, strictly increasing in each argument, such that 

A3. Bel(V U V'\U) = G(Bel(V\U), Bel(V'\U)). 

With these assumptions, he gives a proof much in the spirit of that of Cox to show that Bel 
is essentially a probability distribution. Dubois and Prade point out that, in their example, 
there is no function G satisfying A3 (even if we drop the requirement that G be continuous 
and strictly increasing in each argument). 2 

Reichenbach (1949) earlier proved a result similar to AczeTs, under somewhat stronger 
assumptions. In particular, he assumed A3, with G being +. 

Other variants of Cox's result have also been considered in the literature. For example, 
Heckerman (1988) and Horvitz, Heckerman, and Langlotz (1986) assume that F is contin- 
uous and strictly increasing in each argument and S is continuous and strictly decreasing. 
Since min is not strictly continuous in each argument, it fails this restriction too. 3 Aleliunas 
(1988) gives yet another collection of assumptions and claims that they suffice to guarantee 
that Bel is essentially a probability distribution. 

2. In fact, Aczel allows there to be a different function Gjj for each set U on the right-hand side of the 
conditional. However, the Dubois-Prade example does not even satisfy this weaker condition. 

3. Actually, the restriction that F be strictly increasing in each argument is a little too strong. If e = Bel(0), 
then it can be shown that F(e, x) = F(x, e) = e for all x, so that F is not strictly increasing if one of its 
arguments is e. 
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The first to observe potential problems with Cox's result is Paris (1994). As he puts 
it, "Cox's proof is not, perhaps, as rigorous as some pedants might prefer and when an 
attempt is made to fill in all the details some of the attractiveness of the original is lost." 
Paris provides a rigorous proof of the result, assuming that the range of Bel is contained 
in [0, 1] and using assumptions similar to those of Horvitz, Heckerman, and Langlotz. In 
particular, he assumes that F is continuous and strictly increasing in (0, l] 2 and that S is 
decreasing. However, he makes use of one additional assumption that, as he himself says, 
is not very appealing: 

A4. For all < a, f3, 7 < 1 and e > 0, there are sets U\ D U 2 2 ^3 2 U4 such that U3 ^ 0, 
and each of |Bel(t/"4 1 1/3) — a\, |Bel(C/3 1 ?7"2 ) — P\, and |Bel(£/2|^i) — 7| is less than e. 

Notice that this assumption forces the range of Bel to be dense in [0, 1]. This means that, 
in particular, the domain W on which Bel is defined cannot be finite. 

Is this assumption really necessary? Paris suggests that Aczel needs something like it. 
(This issue is discussed in further detail below.) The counterexample of this paper gives 
further evidence. It shows that Cox's result fails in finite domains, even if we assume that 
the range of Bel is in [0, 1], S(x) = 1 — x (so that, in particular, S is twice differentiable and 
monotonically decreasing), G(x,y) = x + y, and F is infinitely differentiable and strictly 
increasing on (0, l] 2 . We can further assume that F is commutative, F(0, x) = F(x, 0) = 0, 
and that F(x, 1) = F(l,x) = x. The example emphasizes the point that the applicability 
of Cox's result is far narrower than was previously believed. It remains an open question 
as to whether there is an appropriate strengthening of the assumptions that does give us 
Cox's result in finite settings. There is further discussion of this issue in Section 5. 

In fact, the example shows even more. In the course of his proof, Cox claims to show 
that F must be an associative function, that is, that F(x,F(y,z)) = F(F(x,y), z). For the 
Bel of the counterexample, there can be no associative function F satisfying A2. It is this 
observation that is the key to showing that there is no probability distribution isomorphic 
to Bel. 

What is going on here? Actually, Cox's proof just shows that F(x, F(y, z)) = F(F(x, y), z) 
only for those triples (x,y,z) such that, for some sets Ui, U 2l Us, and U4, we have 
x = Bel(?7 4 |?73 n U 2 n Ui), y = Bel(U 3 \U 2 n U x ), and z = Bel(f7 2 |t/i). If the set of such 
triples (x,y,z) is dense in [0, l] 3 , then we conclude by continuity that F is associative. The 
content of A4 is precisely that the set of such triples is dense in [0,1] 3 . Of course, if W 
is finite, we cannot have density. As my counterexample shows, we do not in general have 
associativity in finite domains. Moreover, this lack of associativity can result in the failure 
of Cox's theorem. 

A similar problem seems to exist in Aczel's proof (as already observed by Paris (1994)). 
While Aczel's proof does not involve showing that F is associative, it does involve showing 
that G is associative. Again, it is not hard to show that G is associative for appropriate 
triples, just as is the case for F. But it seems that Aczel also needs an assumption that 
guarantees that the appropriate set of triples is dense, and it is not clear that his assumptions 
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do in fact guarantee this. 4 As shown in Section 2, the problem also arises in Reichenbach's 
proof. 

The counterexample to Cox's theorem, with slight modifications, can also be used to 
show that another well-known result in the literature is not completely correct. In his sem- 
inal book on probability and qualitative probability (1973), Fine considers a non- numeric 
notion of comparative (conditional) probability, which allows us to say U U given V is at least 
as probable as U' given V", denoted U\V y U'\V. Conditions on y are given that are 
claimed to force the existence of (among other things) a function Bel such that U\V y U'\V 
iff Bel([/jV) > Bel(U'\V) and an associative function F satisfying A2. (This is Theorem 
8 of Chapter II in (Fine, 1973).) However, the Bel defined in my counterexample to Cox's 
theorem can be used to give a counterexample to this result as well. 

Interestingly, this is not the first time a similar error has been noted in the use of 
functional equations. Falmagne (1981) gives another example (in a case involving a utility 
model of choice behavior) and mentions that he knows "of at least two similar examples in 
the psychological literature". 

The remainder of this paper is organized as follows. In the next section there is a more 
detailed discussion of the problem in Cox's proof. The counterexample to Cox's theorem is 
given in Section 3. The following section shows that it is also a counterexample to Fine's 
theorem. Section 5 concludes with some discussion, particularly of assumptions under which 
Cox's theorem might hold. 

2. The Problem With Cox's Proof 

To understand the problems with Cox's proof, I actually consider Reichenbach's proof, 
which is similar in spirit Cox's proof (it is actually even closer to AczeTs proof), but uses 
some additional assumptions, which makes it easier to explain in detail. Aczel, Cox, and 
Reichenbach all make critical use of functional equations in their proof, and they make the 
same (seemingly unjustified) leap at corresponding points in their proofs. 

In the notation of this paper, Reichenbach (1949, pp. 65-67) assumes (1) that the range 
of Bel(-|-) is a subset of [0, 1], (2) Bel(V|l7) = 1 if U C V, (3) that if V and V are disjoint, 
then Bel(VUV'\U) = Bel(V\U) + Bel(V'\U) (thus, he assumes that A3 holds, with G being 
+), and (4) that A2 holds with a function F that is differentiable. (He remarks that the 
result holds even without assumption (4), although the proof is more complicated; Aczel in 
fact does not make an assumption like (4).) 

Reichenbach's proof proceeds as follows: Replacing V in A2 by V\ U V 2 , where V\ and 
V 2 are disjoint, we get that 

Bel(V n (V4 U V 2 )\U) = F(Bel(Vi U V 2 \V n U), Bel(V\U)). (2) 

Using the fact that G is +, we immediately get 

Bel(V fl (Vi U V 2 )\U) = Bel(V n Vi\U) + Bel(V nV 2 |l7) (3) 

4. I should stress that my counterexample is not a counterexample to Aczel's theorem, since he explicitly 
assumes that the range of Bel is infinite. However, it does point out potential problems with his proof, 
and certainly shows that his argument does not apply to finite domains. Aczel is in fact aware of the 
problems with his proof [private communication, 1996]. He later proved results in a similar spirit with 
the aid of a requirement of nonatomicity (Aczel & Daroczy, 1975, pp. 5-6), which is in fact a stronger 
requirement than A4, and thus also requires the domain to be infinite. 
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and 

= F(Bel(V!\V DU) + Bel(V 2 \V nU) 7 Bel(V\U)) ^ ' 

Moreover, by A2, we also have, for i = 1, 2, 

Bel(V HVi\U) = F(Bel(V n V t \V n U), Bel(V\U)). (5) 

Putting together (2), (3), (4), and (5), we get that 

F(Bel(V nV^V nU),Bel(V\U)) + F(Bel(V n V 2 |U n U), Bel(V\U)) 

= F(Bel(V r\Vi\Vr\U) + Bel(V nV 2 \VnU), Bel(V\U)). ' 

Taking x = Bel(V (IV^V (lU), y = Bel(V n V 2 \V n f7), and z = Bel(V|?7) in (6), we get 
the functional equation 

FOr^+i^y,*) = F0r + y,*). (7) 

Suppose that we assume (as Reichenbach implicitly does) that this functional equation 
holds for all (x, y, z) € P = {(x, y, z) € [0, l] 3 : x + y < 1}. The rest of the proof now follows 
easily. First, taking x = in (7), it follows that 

F(0,z)+F(y,z) = F(y,z), 

from which we get that 

F(0,z) = 0. 

Next, fix z and let g z (x) = F(x,z). Since F is, by assumption, differentiable, from (7) we 
have that 

g' z (x) = lim(F(x + y, z) - F(x,z)/y) = UmF(y,z)/y. 

It thus follows that g' z (x) is a constant, independent of x. Since the constant may depend 
on z, there is some function h such that g' z {x) = h(z). Using the fact that F(0,z) = 0, 
elementary calculus tells us that 

g z (x) = F(x,z) = h(z)x. 
Using the assumption that for all U, V, we have Bel(V|£7) = 1 if U C V, we get that 

Bel(V\U) = Bel(U n V\U) = F(Bel(V\V n U), Bel(V\U)) = Bel(V\U)). 
Thus, we have that 

F(l,z) = h(z) = z. 

We conclude that F(x, z) = xz. 

Note, however, that this conclusion depends in a crucial way on the assumption that 
the functional equation (7) holds for all (x, y, z) € P. 5 In fact, all that we can conclude 
from (6) is that it holds for all (x,y,z) such that there exist U, V, Vi, and V 2 , with V\ and 
V 2 disjoint, such that x = Bel(U n V X \V n U), y = Bel(V n V 2 \V n U), and z = Bel(V\U). 

5. Actually, using the continuity of F, it suffices that the functional equation holds for a set of triples which 
is dense in P. 
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Let us say that a triple that satisfies this condition is R- constrained (since it must satisfy 
certain constraints imposed by the F and G functions; the R here is for Reichenbach, to 
distinguish this notion from a similar one defined in the next section.) As I mentioned 
earlier, Aczel also assumes that Bel(V|?7) takes on all values in [e, E], where e = Bel(0|?7) 
and E = Bel(U\U). (In Reichenbach's formulation, e = and E = 1.) There are two ways 
to interpret this assumption. The weak interpretation is that for each x £ [0, 1], there exist 
U,V such that Bel(V|£7) = x. The strong interpretation is that for each U and x, there 
exists V such that Bel(V"|?7) = x. It is not clear which interpretation is intended by Aczel. 
Neither one obviously suffices to prove that every triple in P is R-constrained, although it 
does seem plausible that it might follow from the second assumption. 

In any case, neither Aczel nor Reichenbach see a need to check that Equation (7) holds 
throughout P. (Nor does Cox for his analogous functional equation, nor do the authors of 
more recent and polished presentations of Cox's result, such as Jaynes (1996) and Tribus 
(1969).) However, it turns out to be quite necessary to do this. Moreover, it is clear that if 
W is finite, there are only finitely tuples in P that are R-constrained, and it is not the case 
that all of P is. As we shall see in the next section, this observation has serious consequences 
as far as all these proofs are concerned. 

3. The Counterexample to Cox's Theorem 

The goal of this section is to prove 

Theorem 3.1: There is a function Bel®, a finite domain W, and functions S, F, and G 
satisfying Al, A2, and A3 respectively such that 

• Beh(V\U) e [0,1] forXJ + %, 

• S(x) = 1 — x (so that S is strictly decreasing and infinitely differentiable), 

• G(x,y) = x + y (so that G is strictly increasing in each argument and is infinitely 
differentiable), 

• F is infinitely differentiable, nondecreasing in each argument in [0, l] 2 , and strictly in- 
creasing in each argument in (0, l] 2 . Moreover, F is commutative, F(x, 0) = F(0, x) = 
0, and F(x, 1) = F(l, x) = x. 

However, there is no one-to-one onto function g : [0,1] — > [0,1] satisfying (1). 

Note that the hypotheses on Belo, S, G, and F are at least as strong as those made in all 
the other variants of Cox's result, while the assumptions on g are weaker than those made 
in the variants. For example, there is no requirement that g be continuous or increasing 
nor that g o Belo is a probability distribution (although Paris and Aczel both prove that, 
under their assumptions, g can be taken to satisfy all these requirements). This serves to 
make the counterexample quite strong. 
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The proof of Theorem 3.1 is constructive. Consider a domain W with 12 points: 
w\, w\2. We associate with each point w € W a weight f(w), as follows. 

f( Wl ) = 3 f{w 4 ) = 5 x 10 4 

f(w 2 ) = 2 f(w 5 ) = 6 x 10 4 

/(w 3 ) = 6 /K) = 8 x 10 4 

f(w 7 ) = 3 x 10 8 f(w w ) = 3 x 10 18 
f ( W8 ) = 8 x 10 8 /(ton) = 2 x 10 18 
f(w 9 ) = 8 x 10 8 /(u, 12 ) = 14 x 10 18 

For a subset U of W, we define f(U) = Y^weu f( w )- Thus, we can define a probability 
distribution Pr on W by taking Pi(U) = f(U)/f(W). 

Let /' be identical to /, except that f'(w 10 ) = (3 - 6) x 10 18 and f'(w n ) = (2 + 
6) x 10 18 , where 5 is defined below. Again, we extend /' to subsets of W by defining 
f'(U) = E W £uf'H- Let W = {w 10 ,w n ,w 12 }. If U ^ <b, define 



Bel (F|?7) 



f'(V r\U)/f(U) ifW'CU 
f(VnU)/f(U) otherwise. 



Belo is clearly very close to Pr. If U ^ 0, then it is easy to see that |Belo(V|?7) — Pr(V|?7)| = 
n U) - f(V n U)\/f(U) < S. We choose <5 > so that 

if Pt(V\U) > Pt(V'\U'), then Bel (V\U) > Bel (V'\U'). (8) 

Since the range of Pr is finite, all sufficiently small 6 satisfy (8). 

The exact choice of weights above is not particularly important. One thing that is 
important though is the following collection of equalities: 

Pr(w 1 \{w 1 ,w 2 }) = Pr(w w \{w w ,wn}) = 3/5 
Pr({w 1 ,w 2 }\{w 1 ,w 2 ,w 3 }) = Pr(w i \{w i ,w 5 }) = 5/11 

Pr({f«4,tt;5}|{tt;4,w;5,tt; 6 }) = Pr({fy 7 , w 8 }\{w 7 , w&, w 9 }) = 11/19 (9) 
Pr(wi\{wi,w 5l w 6 }) = Pr({w w ,wn}\{wio,wn,w 12 }) = 5/19 
Pr(w 1 \{w 1 ,w 2 ,w 3 }) = Pr(-u;7|{tt;7,'u;8}) = 3/11. 

It is easy to check that exactly the same equalities hold if we replace Pr by Belo- 

We show that Belo satisfies the requirements of Theorem 3.1 by a sequence of lemmas. 
The first lemma is the key to showing that Belo cannot be isomorphic to a probability func- 
tion. It uses the fact (proved in Lemma 3.3) that if Belo were isomorphic to a probability 
function, then there would have to be a function F satisfying A2 that is associative. Al- 
though, as is shown in Lemma 3.7, the function F satisfying A2 can be taken to be infinitely 
differentiable and increasing in each argument, the equalities in (9) suffice to guarantee that 
it cannot be taken to be associative, that is, we do not in general have 

F(x, F(y, z)) = F(F(x, y), z). 

Indeed, there is no associative function F satisfying A2, even if we drop the requirements 
that F be differentiable or increasing. 
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Lemma 3.2: For Belo as defined above, there is no associative function F satisfying A2. 

Proof: Suppose there were such a function F. From (9), we must have that 

F(5/ll, 11/19) 
= F(Belo(^ 4 |{if4, w 5 }), Bel (-j>4, w 5 }\{wi,w 5l w 6 })) 
= Bel (w i \{w i ,w 5 ,w 6 }) = 5/19 



and that 

It follows that 
and that 



F(3/5,5/ll) 

F(Bel (w 1 \{w 1 ,w 2 }), Bel ({w 1 ,w 2 }\{w 1 ,w 2 ,w 3 })) 
Bel (wi|-j>i, ^2,^3}) = 3/11. 

F(3/5,F(5/11, 11/19)) = F(3/5,5/19) 



F(F(3/5,5/ll), 11/19) = F(3/ll, 11/19). 
Thus, if F were associative, we would have 

F(3/5,5/19) = F(3/ll, 11/19). 

On the other hand, from (9) again, we see that 

F(3/5,5/19) 

= F(Bel (wio|{«'io,f«ii}),Belo({-f«io,«'ii}|{f«io,«'ii,«'i2})) 
= Belo(w w \{w 10 ,wn,w 12 }) = (3 - <5)/19, 

while 

F(3/ll, 11/19) 
= F(Bel (w 7 \{w 7l w 8 }), Bel (-j>7, w 8 }\{w 7 , w s , w 9 })) 
= Bel (w 7 \{w 7l w 8 ,w 9 }) = 3/19. 

It follows that F cannot be associative. □ 

To understand how Lemma 3.2 relates to our discussion in Section 2 of the problems 
with Reichenbach's proof, we say (x, y, z) is a constrained triple if there exist sets XJ\ D U 2 2 
U 3 D Ui with U 3 ^ such that x = Bel (f7 4 |fr3), y = Bel (U 3 \U 2 ), and z = Belo^l^i)- 
It is easy to see that A2 forces F to be associative on constrained triples, since if w = 
Bel (C/ 3 |C7i) and w' = Bel {U 4 \U 2 ), by A2, we have F(x, F(y, z)) = F(x,w) = Bel (£/ 4 |^i) 
and F(F(x,y), z) = F(w',z) = Belo(£/4, U\). A4 says that the set of constrained triples is 
dense in [0, l] 3 . 

We similarly define (x, y) to be a constrained pair if there exist sets U\ D U 2 2 Us 
with U 2 + such that x = Belo(f7 3 |f7 2 ) and y = Bel (f7 2 |^i). We say that (Ui,U 2 ,U 3 ) 
corresponds to the constrained pair (x,y). (Note that there may be more than one triple 
of sets corresponding to a constrained pair.) If (Ui, U 2 , U3) corresponds to the constrained 
pair (x,y) and F satisfies A2, then we must have F(x,y) = Belo(Us\Ui). Note that both 
(3/5,5/11) and (5/11,11/19) are constrained pairs, although the triple (3/5,5/11,11/19) 
is not constrained. It is this fact that we use in Lemma 3.2. 

The next lemma shows that Belo cannot be isomorphic to a probability function. 
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Lemma 3.3: For Belo as defined above, there is no one-to-one onto function g : [0, 1] — > 
[0,1] satisfying (1). 

Proof: Suppose there were such a function g. First note that g(Belo(t/)) ^ if U ^ 0. For 
if g(Belo(?7)) = 0, then it follows from (1) that for all V C U, we have 

fl (Bel (V)) = g(Bel (V\U)) x g(Bel (U)) = g(Bel (V\U)) x = 0. 

Thus, g(Belo(V)) = g(Belo(U)) for all subsets V ofU. Since the definition of Belo guarantees 
that Belo(V) ^ Belo(?7) if V is a strict subset of U, this contradicts the assumption that g 
is one-to-one. Thus, g(Bel (U)) ^ if U ^ 0. It now follows from (1) that if U ^ 0, then 

g(Bel (V\U)) = g(Bel (V n *7))/<?(Bel (!7)). (10) 

Now define F(x,y) = g^ 1 (g(x) x g(y)). We show that F defined in this way satisfies A2 
and is associative. This will give us a contradiction to Lemma 3.2. 

To see that F satisfies A2, notice that, by applying the observation above repeatedly, if 
V n U ^ 0, we get 

F(Bel (V'\V n U), Bel (V|l7)) 
= 5 - 1 ((5(Bel (F'|F n U)) x g(Bel (V\U)) 

= 5 - 1 ((5(Bel (F / n V n U))/g(Bel (V n U))) x ( 5 (Bel (V n t/))/ 5 (Bel (t/)))) 
= g-HgiBehiV HVn U))/g(Bel (U))) 
= g-HgiBeUV'DVlU))) 
= Bel (V' nV\U). 

Thus, F satisfies A2. 

To see that F is associative, note that 

F(F(x,y),z) = g- l {g{g- 1 {g{x) X g{y))) X g{z)) 
= g- l (g{x) xg(y) X g{z)) 

= g~ l {g(x) x g{g~ l (g{y) x ^(2)))) 

= F(x,F(2/, 2 )). 

This gives us the desired contradiction to Lemma 3.2. It follows that Belo cannot be 
isomorphic to a probability function. □ 

Despite the fact that Belo is not isomorphic to a probability function, functions S, F, and 
G can be defined that satisfy Al, A2, and A3, respectively, and all the other requirements 
stated in Theorem 3.1. The argument for S and G is easy; all the work goes into proving 
that an appropriate F exists. 

Lemma 3.4 : There exists an infinitely differentiate, strictly decreasing function S : 
[0,1] -> [0,1] such that Beh(V\U) = S(Bel {V\U)) for all sets U, V C W with U + 0. 
In fact, we can take S(x) = 1 — x. 

Proof: This is immediate from the observation that Belo(V|?7) = 1 — Belo(V|?7) for U, V C 
W. □ 
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Lemma 3.5: There exists an infinitely differentiable function G : [0, l] 2 — > [0, 1], increasing 
in each argument, such that ifU, V, V C W, V r\V = 0, and U ^ 0, then Bek(V UV'\U) = 
G(Belo(V\U), Belo(V, U)). In fact, we can take G(x,y) =x + y. 

Proof: This is immediate from the definition of Belo- □ 

Thus, all that remains is to show that an appropriate F exists. The key step is provided 
by the following lemma, which essentially shows that there is a well defined F that is 
increasing. 

Lemma 3.6: lfU 2 r\U x ^§ and V 2 f)V 1 ^ 0, then 

(a) if Beh{V z \V 2 r\Vi) < BehiU^^U^ and 5eZ (V 2 |Vi) < 5e/ (^ 2 |^i), then Beh(V 3 n 
V 2 \V 1 ) < BehiUsn^p,), 

(b) if 5efc(V 3 |V 2 nVi) < BehiU^nU,), 5e/ (W) < Bel^U^, BehiU^nU^ > 
0, and BelviUzpi) > 0, then Belo(V 3 n V 2 \Vi) < Belo(U s n U 2 \U 1 ), 

(c) if BehiV^nV,) < BehiU^nU,), 5e/ (W) < Be^U^), BehiU^nU^ > 
0, and BehiU^Ui) > 0, then Beh(V 3 n F 2 |Vi) < Beh(U 3 n U 2 \Ui), 

Proof: First observe that if Bel (F 3 |F 2 n Vi) < Bel (f7 3 |f7 2 n U x ) and Bel (F 2 |Vi) < 
Bel (?7 2 |?7i), then from (8), it follows that Pr(F 3 |F 2 nli) < Pr{U 3 \U 2 n E7i) and Pr(F 2 |Fi) < 
Pr(?7 2 |?7i). If we have either Pr(V 3 |V 2 n"Vi) < Pr(£/ 3 |£/ 2 n£/i) or Pr(]/ 2 |l/i) < Pr(?7 2 |?7i), then 
we have either Pr(F 3 n V 2 |Vi) < Pr(f7 3 n f7 2 |t/i) or Pr(f7 3 |f7 2 n U{) = or Pr(f7 2 |f7i) = 0. 
It follows that either Bel (F 3 n V 2 |Vi) < Bel (f7 3 n U 2 \Ui) (this uses (8) again) or that 
Bel (F 3 n V 2 \Vi) = Bel (f7 3 n U 2 \U{) = 0. In either case, the lemma holds. 

Thus, it remains to deal with the case that Pr(V 3 |V 2 n V\) = Pr(?7 3 |?7 2 n U\) and 
Vx{V 2 \Vi) = Vx{U 2 \Ui), and hence Pr(V 3 n V 2 \Vi) = Pr(?7 3 n U^Ux). The details of this 
analysis are left to the appendix. □ 

Lemma 3.7: There exists a function F : [0, l] 2 — > [0, 1] satisfying all the assumptions of 
Theorem 3.1 (with respect to Belo). 

Proof: Define a partial function F' on [0,1] 2 whose domain D consists of all constrained 
pairs. For a constrained pair, we define F 1 in the unique way required to satisfy A2. 
A priori^ F 1 may not be well defined; it is possible that there exist triples (Ui,U 2 ,Us) 
and {Vi,V 2 ,Vz) that both correspond to (x,y) (i.e., x = Belo(?7 3 |?7 2 ) = Belo(V 3 |F 2 ) and 
y = Bel (£/ 2 |?7i) = Bel (V 2 |Vi)) such that Bel (?7 3 |?7i) + Bel (V 3 |Vi). If this were the case, 
then F'(x,y) would not be well defined. However, Lemma 3.6 says that this cannot happen. 
Moreover, Lemma 3.6 assures us that F' is increasing on D, and strictly increasing as long 
as one of its arguments is not 0. Indeed, if there is a triple (Ui, U 2 , Us) corresponding to 
(x,y) such that {wio,wn,wi 2 } % Ui, then we must have F'(x,y) = xy. 

The domain D of F' is finite. Let D' be the commutative closure of D, so that D' 
consists of D and all pairs (y,x) such that (x,y) is in D. Extend F' to a commutative 
function F" on D 1 by defining F"(y,x) = F'(x,y) if (x,y) € D. F" is well defined because, 
as can easily be verified, if (x,y) and (y,x) are both in D, one of x or y must be 1, and 
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F'(x,l) = F'(l,x) = x. Clearly F" is commutative. It is also increasing. For suppose 
(x,y), (x',y') G D', x < x', and y < y' . If both (x,y) and (x',y') are in D, we must have 
F"(x,y) < F"(x',y'), since F' is increasing. Similarly, if both (y,x) and (y',x') are in D, 
we must have F"(x,y) = F'(y,x) < F'(y',x') = F"(x',y'). Finally, if (x,y) and (y',x') are 
in D, a straightforward check over all possible elements in D shows that this can happen 
only if the triples (U\,U 2 ,U 3 ) and (Vi, V^V-j) corresponding to (x,y) and (y',x') are such 
that {wio, wn, wu} is not a subset of either U\ or V\. It follows that F'(x,y) = xy and 
F'(y',x') = x'y', so again we get that F" is increasing. A similar argument shows that F" 
is strictly increasing as long as one of its arguments is not 0. 

It is straightforward to extend F" to a commutative, infinitely differentiable, and in- 
creasing function F defined on all of [0, l] 2 , which is strictly increasing on (0, l] 2 , and satisfies 
F(x, 1) = F(l,x) = x and F(x,0) = F(0,x) = 0. We proceed as follows. We first extend 
F" so that it is defined for all pairs (x, y) G [0, l] 2 such that x > y so that it has the required 
properties. If x < y, we then define so that F(x,y) = F(y,x). Since F" is commutative, 
this definition agrees with F"(x,y) for x < y. Clearly F is commutative and infinitely 
differentiable. To see that F is increasing, suppose that x < x' and y < y 1 . Just as in the 
case of F", it is immediate that F is increasing if both x > y and x' > y' or both x < y and 
x 1 < y 1 . Otherwise, suppose x > y and y' > x' . Then we have y < x < x' < y'. Since F is 
increasing on {(a;, y) : x > y}, we have F(x, y) < F(x' , y) < F(x' , x') < F(y' , x') = F(x' , y'). 
A similar argument shows that F is strictly increasing unless one its arguments is 0. Finally, 
F clearly satisfies A2, since (by construction) F' does, and A2 puts constraints only on the 
domain of F 1 . □ 

Theorem 3.1 now follows from Lemmas 3.3, 3.4, 3.5, and 3.7. 
4. The Counterexample to Fine's Theorem 

Fine is interested in what he calls comparative conditional probability. Thus, rather than 
associating a real number with each "conditional object" V\U, he puts an ordering y on 
such objects. As usual, V\U >- V'\U' is taken to be an abbreviation for V\U y V'\U' and 
not(V'\U' y V\U). 

Fine is interested in when such an ordering is induced by a real-valued belief function 
with reasonable properties. He says that a real-valued function P on such objects agrees 
with y if P(V\U) > P(V'\U') iff V\U y V'\U'. Fine then considers a number of axioms 
that y might satisfy. For our purposes, the most relevant are the ones Fine denotes QCC1, 
QCC2, QCC5, and QCC7. 

QCC1 just says that y is a linear order: 

QCC1. V\U y V'\U' or V'\U' h V\U. 

QCC2 says that y is transitive: 

QCC2. If Vi|l7i h V 2 \U 2 and V 2 \U 2 h V 3 \U 3l then y V 3 \U 3 . 

QCC5 is a technical condition involving notions of order topology. The relevant defini- 
tions are omitted here (see (Fine, 1973) for details), since QCC5, as Fine observes, holds 
vacuously in finite domains (the only ones of interest here). 
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QCC5. The set {V|f7} has a countable basis in the order topology induced by >-. 

Finally, QCC7 essentially says that y is increasing, in the sense of Lemma 3.6. 
QCC7. 

(a) if V3IV2 n Vi y u 3 \u 2 n c/i and v 2 \Vi y u 2 \U! then F 3 n V2IV1 y u 3 n £/ 2 |^i. 

(b) If F 3 |F 2 n V 1 y U 2 \U 1 and V 2 \Vi ^ ?7 3 |?7 2 n then V 3 n V 2 |Vi ^ ?7 3 n ?7 2 |^i- 

(c) If V 3 |V 2 n V 1 y U 3 \U 2 n E^, V 2 |Vi y U 2 \U U and V 2 |Vi y 0|W, then F 3 n V 2 |Vi >- 
J7 3 n?7 2 |?7i. 

Fine then claims the following theorem: 

Fine's Theorem: (Fine, 1973, Chapter II, Theorems) Ify satisfies QCC1, QCC2, QCC5, 
then there exists some agreeing function P. There exists a function F of two variables such 



that 




1. 


p(v n v'\u) = F(P(V\v n u), p(v\u)), 6 


2. 


F{x,y) = F(y,x), 


3. 


F(x,y) is increasing in x for y > P($\W), 


4- 


F{x,F{y,z))=F{F{x,y),z), 


5. 


F(P(W\U),y)=y, 


6. 


F(P(<b\U),y)=PW\U). 



iff h also satisfies QCC7. 

The only relevant clauses for our purposes are Clause (1), which is just A2, and Clause 
(4), which says that F is associative. As Lemma 3.2 shows, there is no associative function 
satisfying A2 for Belo. As I now show, this means that Fine's theorem does not quite hold 
either. 

Before doing so, let me briefly touch on a subtle issue regarding the domain of y. In 
the counterexample of the previous section, Belo(V|?7) is defined as long as U ^ 0. Fine 
does not assume that the y relation is necessarily defined on all objects V\U such that 
U,V C W and U 7^ 0. He assumes that there is an algebra T of subsets of W (that is, a 
set of subsets closed under finite intersections and complementation) and a subset T' of T 
closed under finite intersections and not containing the empty set such that y is defined on 
conditional objects V\U such that V € T and U € T' . Since T' is closed under intersection 
and does not contain the empty set, T' cannot contain disjoint sets. If W is finite, then 
the only way a collection T' can meet Fine's restriction is if there is some nonempty set U§ 
such that all elements in T' contain Uq. This restriction is clearly too strong to the extent 
that comparative conditional probability is intended to generalize probability. If Pr is a 
probability function, then it certainly makes sense to compare Pr(V r |?7) and Pr(V'|?7 / ) even 

6. Fine assumes that P(V <~] V'\U) = F(P(V\U), P(V \V <~] U)). I have reordered the arguments here for 
consistency with Cox's theorem. 
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if U and U' are disjoint sets. Fine [private communication, 1995] suggested that it might 
be better to constrain QCC7 so that we do not condition on events U that are equivalent to 
(where U is equivalent to if ^ U and U y 0). Since the only event equivalent to in 
the counterexample of the previous section is itself, this means that the counterexample 
can be used without change. This is what is done in the proof below. I show below how to 
modify the counterexample so that it satisfies Fine's original restrictions. 

Theorem 4.1: There exists an ordering y satisfying QCC1, QCC2, QCC5, and QCC7, 
such that for every function P agreeing with y, there is no associative function F of two 
variables such that P(V n V')\U) = F(P(V'\V n U), P(V\U)). 

Proof: Let W and Belo be as in the counterexample in the previous section. Define y 
so that Bel agrees with K Thus, V\U y V'\U' iff Bel {V\U) > Bel (V'\U'). Clearly 
y satisfies QCC1 and QCC2. As was mentioned earlier, since W is finite, y vacuously 
satisfies QCC5. Lemma 3.6 shows that y satisfies parts (a) and (c) of QCC7. To show that 
y also satisfies part (b) of QCC7, we must prove that if Belo^l^ n Vi) > Bel (?72 and 
Bel (F 2 |Vi) > Bel (f7 3 |^2 nUi), then Bel (V 3 n F 2 |^i) > Bel {U 3 f)U 2 \U 1 ). The proof of 
this is almost identical to that of Lemma 3.6; we simply exchange the roles of Pr^l^i) an d 
Pr(V3|V2nVi) in that proof. I leave the details to the reader. Lemma 3.2 shows that there is 
no associative function F satisfying A2 for Belo. All that was used in the proof was the fact 
that Belo satisfied the inequalities of (9). But these equalities must hold for any function 
agreeing with y. Thus, exactly the same proof shows that if P is any function agreeing with 
>:, then there is no associative function F satisfying P(Vf)V'\U) = F(P(V \VnU), P(V\U)). 
□ 

I conclude this section by briefly sketching how the counterexample can be modified so 
that it satisfies Fine's original restriction. Redefine W by adding one more element wq. 
Redefine / and /' so that f(wo) = f'(wo) = 10~ 5 ; in addition, redefine / and /' on ws, wq, 
wg, and wi2, so as to decrease their weight by 10 -5 , the weight of wq. Thus, 

• fM = f'M = 6-io- 5 , 

• /K) = f'(w 6 ) = 8 x 10 4 - lO" 5 , 

• f( Wg ) = f'( W9 ) = 8 x 10 8 - 10" 5 , and 

• f(w 12 ) = f'(w u ) = 14 x 10 18 - 10- 5 . 

Finally, redefine W to be {wo, wio, wn, w^}- The definition of Belo in terms of/, /', and 
W remains the same. With these redefinitions, the proofs of the previous section go through 
essentially unchanged. In particular, the equalities in (9) now hold if we add wq to every set. 
Let T' consist of all subsets of W containing wq. Notice that T' is closed under intersection 
and does not contain the empty set. The lack of associativity in Lemma 3.2 can now be 
demonstrated by conditioning on sets in J 7 '. As a consequence, we get a counterexample to 
Fine's theorem even when restricting to conditional objects that satisfy his restriction. 
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5. Discussion 

Let me summarize the status of various results in the light of the counterexample of this 
paper: 

• Cox's theorem as originally stated does not hold in finite domains. Moreover, even 
in infinite domains, the counterexample and the discussion in Section 2 suggest that 
more assumptions are required for its correctness. In particular, the claim in his proof 
that F is associative does not follow. 

• Although the counterexample given here is not a counterexample to Aczel's theo- 
rem, his assumptions do not seem strong enough to guarantee that the function G is 
associative, as he claims it is. 

• The variants of Cox's theorem stated by Heckerman (1988), Horvitz, Heckerman, and 
Langlotz (1986), and Aleliunas (1988) all succumb to the counterexample. 

• The claim that the function F must be associative in Fine's theorem is incorrect. 
Fine has an analogous result (Fine, 1973, Chapter II, Theorem 4) for unconditional 
comparative probability involving a function G as in Aczel's theorem. This function 
too is claimed to be associative, and again, this does not seem to follow (although my 
counterexample does not apply to that theorem). 

Of course, the interesting question now is what it would take to recover Cox's theo- 
rem. Paris's assumption A4 suffices, as does the stronger assumption of nonatomicity (see 
Footnote 4). As we have observed, A4 forces the domain of Bel to be infinite, as does the 
assumption that the range of Bel is all of [0,1]. We can always extend a domain to an 
infinite — indeed, uncountable — domain by assuming that we have an infinite collection of 
independent fair coins, and that we can talk about outcomes of coin tosses as well as the 
original events in the domain. (This type of "extendibility" assumption is fairly standard; 
for example, it is made by Savage (1954) in quite a different context.) In such an extended 
domain, it seems reasonable to also assume that Bel varies uniformly between (certain 
falsehood) and 1 (certain truth). If we also assume A4 (or something like it), we can then 
recover Cox's theorem. Notice, however, that this viewpoint disallows a notion of belief 
that takes on only finitely many gradations. 

Another possibility is to observe that we are not interested in just one domain in isola- 
tion. Rather, what we are interested in is a notion of belief Bel that applies uniformly to all 
domains. Thus, even if (U,V) and (U',V) are pairs of subsets of different (perhaps even 
disjoint) domains, if Bel(V|[/) and Bel(V r/ |L r ') are both 1/2, then we would expect this to 
denote the same relative strength of belief. In this setting, an analogue of A4 seems more 
reasonable. That is, we can assume that for all < a, /3, 7 < 1 and e > 0, there is some 
domain W and subsets U\, U2, U3, and U4 of W such that the conclusion of A4 holds. If 
we further assume that the functions F, G, and S are also uniform across domains (that is, 
that Al, A2, and A3 hold for the same choice of F, G, and S in every domain), then we 
can again recover Cox's theorem. 7 

7. This point was independently observed by Jeff Paris [private communication, 1996]. 
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The idea of having a notion of uncertainty that applies uniformly in all domains seems 
implicit in some discussion in that Jaynes' recent book on probability theory (1996). Jaynes 
focuses almost exclusively on finite domains. 8 As he says "In principle, every problem must 
start with such finite set probabilities; extensions to infinite sets is permitted only when 
this is the result of a well-defined and well-behaved limiting process from a finite set." To 
make sense of this limiting process, it seems that Jaynes must be assuming that the same 
notion of uncertainty applies in all domains. Moreover, one can make arguments appealing 
to continuity that when we consider such limiting processes, we can always find subsets Ui, 
U2, U3, and U4 in some sufficiently rich (but finite) extension of the original domain such 
that A4 holds. 

While this seems like perhaps the most reasonable additional assumptions required to 
get Cox's result, it does require us to consider many domains at once. Moreover, it does 
not allow a notion of belief that has only finitely many gradations, let alone a notion of 
belief that allows some events to be considered incomparable in likelihood. 9 

Suppose we really are interested in one particular finite domain, and we do not want 
to extend it or consider all other possible domains. What assumptions do we then need 
to get Cox's theorem? The counterexample given here could be circumvented by requiring 
that F be associative on all tuples (rather than just on the constrained triples). However, if 
we really are interested in a single domain, the motivation for making requirements on the 
behavior of F on belief values that do not arise is not so clear. Moreover, it is far from clear 
that assuming that F is associative suffices to prove the theorem. For example, Cox's proof 
makes use of various functional equations involving F and S, analogous to the equation (7) 
that appears in Section 2. These functional equations are easily seen to hold for certain 
tuples. However, as we saw in Section 2, the proof really requires that they hold for all 
tuples. Just assuming that F is associative does not appear to suffice to guarantee that the 
functional equations involving S hold for all tuples. Further assumptions appear necessary. 

Nir Friedman [private communication] has conjectured that the following condition, 
which says that essentially all beliefs are distinct, suffices: 

• if C U C V, C U' C V, and (U, V) ± (U 1 ', V), then Be\(U\V) ± Bel(U'\V). 

Even if this condition suffices, note that it precludes, for example, a uniform probability 
distribution, and thus again seems unduly restrictive. 

Another possibly interesting line of research is that of characterizing the functions that 
satisfy Cox's assumptions. As the example given here shows, the class of such functions 
includes functions that are not isomorphic to any probability function. I conjecture that in 
fact it includes only functions that are in some sense "close" to a function isomorphic to a 
probability distribution, although it is not clear exactly how "close" should be defined (nor 
how interesting this class really is in practice). 

So what does all this say regarding the use of probability? Not much. Although I 
have tried to argue here that Cox's justification of probability is not quite as strong as 

8. Actually, Jaynes assigns probability to propositions, not sets, but, as noted earlier, there is essentially 
no difference between the two. 

9. Interestingly, Jaynes (1996, Appendix A) admits that having plausibility values be elements of a partially- 
ordered lattice may be a reasonable alternative to traditional probability theory. Nir Friedman and I 
(1995, 1996, 1997) have recently developed such a theory and shown that it provides a useful basis for 
thinking about default reasoning and belief revision. 
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previously believed, and the assumptions underlying the variants of it need clarification, 
I am not trying to suggest that probability should be abandoned. There are many other 
justifications for its use. 
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Appendix A. Proof of Lemma 3.6 

Recall that all that remains in the proof of Lemma 3.6 is to deal with the case that Pr(V3|V2n 
Vi) = Pi(U 3 \U 2 nUi) andPr(y 2 |^i) = Pr(£/ 2 |^i), and hence Pr(V 3 nV 2 |Vi) = Pr(U 3 r)U 2 \U 1 ). 

Before proceeding with the proof, it is useful to collect some general facts about Pr. A set 
U is said to be standard if U is a subset of one of {wi,ui2, w 3 }, {w^, w^, wq}, {wj, ws, wg}, or 
{w\o, w\i, 1^12}- A real number a is said to be relevant if there exists some standard U and 
some arbitrary V such that a = Pr(V|?7). Notice that even if U ^ is nonstandard, then, 
taking U' to be the standard subset of U which has the greatest weight, then | Pr(V|£7) — 
Pr(V|t/')| < .002. (This is the reason that the weights are multiplied by factors such as 
10 4 , 10 8 , and 10 18 .) Thus, for any subsets V and U of W, we have that Pr(V|£7) is close to 
a relevant number (where "close" means "within .002"). 

Call a triple (U, V, V) of subsets of W good if Bel (V f~l V\U) = Bel (V'|V f~l U) x 
Belo(V|?7). Clearly if both (Ui, U2, U 3 ) and (Vi,^^) are good, then the lemma holds. 
Notice that if (U, V, V) is not good, then U D {wio, wn, W12} and /(V n {^io, ifii, ^12}) 7^ 
f'iV n {w\q,w\i,w\{\), which means that V n {wio, wn, wu} must contain one of wio and 
wn, but not both, and thus must be one of {^io}, {wu}, {ifio 5 w i2}, or {^11 , 1^12}- 

Thus, we may as well assume that at least one of (Ui, U2, U3) or (Vi, V2, V3) is not good. 
In that case, I claim that one of the following must hold: 

• Bel (V3 n V2IV1) = Bel(V 3 \V 2 n Vi) = BeloC^I^ n Ui) = Bel (U 3 n U 2 \U 1 ) = 

• u 3 n u 2 n Ui = u 2 n u 1 and v s n v 2 n v 1 = v 2 n v x 

• f(Ui) = /(Vi) and /(C/! n u 2 ) = /(Vi n v 2 ) 

In the first case, we have already seen that the lemma holds. In the second case, we have 
Bel (V3 n ValVi) = Bel (V 2 |V 1 ), Bel (?73 n U 2 \U 1 ) = Bel (C/ 2 |C/i), and Bel (V 3 |V 2 n Vi) = 
Belo(?73|?72 n Ui) = 1, so the lemma is easily seen to hold. Finally, in the third case, notice 
that since Pr(U 2 n U 3 \Ui) = Pr(V 2 n V 3 \Vi), we must also have that f(Ui n U 2 n U 3 ) = 
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f{V\ fl V 2 fl V 3 ). Moreover, it is easy to see that all these equalities must hold if / is replaced 
by /'. Again, the lemma immediately follows. 

To prove the claim, for definiteness, assume that (Ui,U 2 ,Us) is not good (an identical 
argument works if (Vi, V 2 , V 3 ) is not good). From the characterization above of triples that 
are not good, it follows that f(U\ n U 2 ) = a x 10 18 + 6 and f(Ui) = 19 x 10 18 + c, where 
a e {2, 3, 16, 17} (depending on U 2 n {w w , wn, wi 2 }), and both 6, c < 20 x 10 8 . Clearly, the 
relevant number closest to Pr(U 2 \Ui) is a/19. Since Pr(V 2 |Vi) = Pr(U 2 \Ui) by assumption, 
Pr(V 2 |V"i) is also close to a/19. Thus, we must have that /(Vi n V 2 ) = a x 10 k + 6' and 
/(Vi) = 19 x 10 fc + c', where A; € {0, 4, 8, 18}. In fact, it is easy to see that k is either 8 or 18, 
since there are no relevant numbers of the form a/19 (for a € {2,3, 16, 17}) that are close 
to Pr(V\U) if U C {wi,w 2 ,w 3 ,w i ,w 5 ,w 6 }. In addition, if k = 18, then 6',c' < 20 x 10 8 , 
while if k = 8, then 6',c' < 20 x 10 4 . By standard arithmetic manipulation, we have that 

10 18 (ac' - 196') + 10 fc (196 - ac) + (be 1 - b'c) = 0. 

If k = 8, then it is easy to see that we must have 

ac ' - 19b' = 0, 196 - ac = and be' - b'c = 0, (11) 

while if k = 18, then we must have 

19(6 - 6') + a(c' - c) = and be' - b'c = 0. (12) 

Now comes a case analysis. First suppose that k = 8. Then we must have b' = c' = 0, 
since if d ^ 0, then from (11) we have that b'/c' = a/19, and it is easy to see that there 
do not exist sets Ti and T 2 such that /(Ti) = 6', /(T 2 ) = c', and 6'/c' = a/19, with 
6',c' < 20 x 10 4 . Thus, it follows that Pr(f7 2 |f7i) = Pr(V 2 |Vi) = a/19. Moreover, we must 
have Vi = {wt,ws,wq} and V 2 n Vi either {wj} or {ws,wg}, depending on a. It follows that 
Pr(V 3 |V 2 n Vi) must be one of {0, 1/2, 1}. Since Pt(U s \U 2 n = Pr(V 3 |V 2 n Vi), we must 
have that Pr(?73|?7 2 (~)Ui) € {0, 1/2, 1}. Since U 2 (~)Ui contains exactly one of wio and wn, it 
is easy to see that Pr(t/ 3 |?7 2 n?7i) cannot be 1/2. If Pr(J7 3 |?7 2 n?7i) = Pr(V 3 |V 2 nV"i) = 0, then 
U 3 fl U 2 n Ui = V 3 n V 2 n Vi = 0, and we must have Bel (?7 3 n U 2 \Ui) = Bel (V 3 n V 2 |V"i) = 0, 
so the claim follows. On the other hand, if Pi(U 3 \U 2 n Ui) = Pr(V 3 |V 2 n V"i) = 1, then 
U 3 fl U 2 fl Ui = U 2 n Ui and V 3 n V 2 n Vi = V 2 n Vi, and the claim again follows. 

Now suppose k = 18. If c = c', then by (12), we must have that 6 = 6'. It immediately 
follows that /(f7i) = /(Vi) and /(^ n ?7 2 ) = /(Vi n V 2 ), so the claim holds. Thus, we can 
suppose c ^ c'. Suppose that c ^ (an identical argument works if c ^ 0). Then there 
exists some x^l such that c = xc'. Since be' — b'c = 0, it follows that 6 = xb' . Substituting 
xb' for 6 and xc' for c in (12), we get that (1 — x)b' /(l — a?)c' = a/19, from which it follows 
that b'/c' = a/19. Moreover, we also get that either 6 = c = or b/c = a/19. It is easy to 
check that a must be either 3 or 16. If b/c = a/19, then we must have 6 = 6' and c = c'. 
As we have seen, this suffices to prove the claim. Thus, we can assume that 6 = c = 0. But 
this means that Ui = {wio,wn,wi 2 }, and that Ui fl U 2 is either {-a>io} or {wn,wi 2 }. It 
follows that the only possibilities for Pr(?7 3 |?7 2 n Ui) are 0, 1/8, 7/8, or 1. It is easy to see 
that Pr(V 3 |V 2 fl Vi) cannot be 1/8 or 7/8, while the cases where it is either or 1 are easily 
taken care of, as above. 

This completes the proof of the claim and of the lemma. □ 
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